Massive database view of all place names

18 views
Skip to first unread message

Kelly Jones

unread,
Apr 8, 2009, 10:10:32 AM4/8/09
to geon...@googlegroups.com
I'm trying to create a db view that has all possible names for all
possible places. Has anyone already done this and/or is working on
this?

Example: Albuquerque, NM, USA can be written in many ways, including:

% Just plain "Albuquerque".

% "Albuquerque, USA": city and country, no state name

% "Albuquerque, Nuevo Mexico": city and alternate name for state

% "Albuquerque, Bernalillio County, New Mexico, USA": includes county

% "Alburquerque, Comte de Bernalillio, USA": alternate city name,
ASCII-fied alternate county name.

% "Albuquerque, NM": city name and state postal code

% "Abq, NM": commonly used abbreviation

% "The Duke City": city nickname (I don't think geonames has this)

% "Albuquerque, Land of Enchantment, USA": state nickname (I don't
think geonames has this)

Basically, this would be a giant JOIN, combining all names for
Albuquerque with all names for the county it's in (but allowing for
empty string as county), with all names for the state (again allowing
for the empty string), etc.

As a final step, I'd remove non-alpha characters (including spaces) and
lowercase everything for normalized text searching (eg,
"albuquerquenmusa").

This seems like an obvious thing to do. Has anyone done it or working
on something like this?

--
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.

Marc Wick

unread,
Apr 12, 2009, 4:10:16 AM4/12/09
to geon...@googlegroups.com
If you want to use it for search then I would look for full text search
libraries for your programming environment. Full text search libraries
are more suited for search then a large database join with all possible
combinations and a 'like' search on it.

Marc

Even Scharning

unread,
Apr 13, 2009, 8:30:28 AM4/13/09
to GeoNames
Hi Kelly,

I suggest you try chopping up the input string, to feed your database
server one piece at a time. For a query like "Albuquerque, New Mexico,
USA", first find "USA", then "New Mexico" (within USA), and finally
"Albuquerque" (within New Mexico).

I have implemented this approach on Time.is (version 2.0, which will
be launched when time is ready), and my routine can find any place in
a fraction of a second. :)

Even

rkgeorge

unread,
Apr 14, 2009, 10:46:48 AM4/14/09
to GeoNames, rkge...@cadmaps.com
An interesting side issue to "all possible names" is "in all
languages." Last week I was playing with Amazon's SimpleDB and Google
Language to see what could be done with a non-structured Cloud data
repository. One feature of this type of datastore is the ability to
add up to 1024 attributes under a single attribute name.

So "name" can contain:
Albuquerque, Nouveau-Mexique
Albuquerque Novo México
ニューメキシコ州アルバカーキ
美國新墨西哥州阿爾伯克基
البوكيرك (نيو مكسيكو
Альбукерке, Нью-Мексико
.
.

The something similar could be done for your need as well. I believe
the SimpleDB select query would query the attribute 'Name' and search
against all 'Name' attributes in the collection. They should all be
automatically indexed by SimpleDB.

Here is some more info on my experiments:
http://www.cadmaps.com/gisblog/?p=57
http://www.cadmaps.com/gisblog/?p=56

randy
Reply all
Reply to author
Forward
0 new messages